An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm

نویسندگان

  • Martin Burtscher
  • Keshav Pingali
چکیده

This chapter describes the first CUDA implementation of the classical Barnes Hut n-body algorithm that runs entirely on the GPU. Unlike most other CUDA programs, our code builds an irregular treebased data structure and performs complex traversals on it. It consists of six GPU kernels. The kernels are optimized to minimize memory accesses and thread divergence and are fully parallelized within and across blocks. Our CUDA code takes 5.2 seconds to simulate one time step with 5,000,000 bodies on a 1.3 GHz Quadro FX 5800 GPU with 240 cores, which is 74 times faster than an optimized serial implementation running on a 2.53 GHz Xeon E5540 CPU.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Performance Comparison of Tree DataStructures for N-Body Simulation

We present a performance comparison of tree data structures for N -body simulation. The tree data structures examined are the balanced binary tree and the Barnes– Hut (BH) tree. Previous work has compared the performance of BH trees with that of nearest-neighbor trees and the fast multipole method, but the relative merits of BH and binary trees have not been compared systematically. In carrying...

متن کامل

N-Body Simulations Using Message Passsing Parallel Computers

In this paper, we present new parallel formulations of the Barnes-Hut method for n-body simulations on message passing computers. These parallel formulations partition the domain eeciently incurring minimal communication overhead. This is in contrast to existing schemes that are based on sorting a large number of keys or on the use of global data structures. The new formulations are augmented b...

متن کامل

A Data Parallel Formulation of the Barnes-Hut Method for N -Body Simulations

This paper presents a data{parallel formulation for N?body simulations using the Barnes-Hut method. The tree-structured problem is rst linearized by using space{{lling curves. This process allows us to use standard data distributions and parallel array operations available in data-parallel languages. A new eecient HPF implementation of the Barnes-Hut method is presented in this paper, character...

متن کامل

Improving the firefly algorithm through the Barnes-Hut tree code

The firefly algorithm is a nature-inspired meta-heuristic algorithm that has a variety of applications such as multimodal optimization, clustering and finding good solutions for NP-hard problems. The original algorithm and modifications thereof have so far always calculated interactions between all fireflies individually which leads to a complexity of O(n). In this paper we present a novel appr...

متن کامل

Field Programmable Gate Array–based Implementation of an Improved Algorithm for Objects Distance Measurement (TECHNICAL NOTE)

In this work, the design of a low-cost, field programmable gate array (FPGA)-based digital hardware platform that implements image processing algorithms for real-time distance measurement is presented. Using embedded development kit (EDK) tools from Xilinx, the system is developed on a spartan3 / xc3s400, one of the common and low cost field programmable gate arrays from the Xilinx Spartan fami...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011